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ABSTRACT 

We carry out a systematic search for extremely metal poor (XMP) galaxies 
in the spectroscopic sample of Sloan Digital Sky Survey (SDSS) data release 7 
(DR7). The XMP candidates are found by classifying all the galaxies accord- 
ing to the form of their spectra in a region 80 A wide around Ha. Due to the 
data size, the method requires an automatic classification algorithm. We use 
k-means. Our systematic search renders 32 galaxies having negligible [Nil] lines, 
as expected in XMP galaxy spectra. Twenty one of them have been previously 
identified as XMP galaxies in the literature - the remaining eleven are new. This 
was established after a thorough bibliographic search that yielded only some 130 
galaxies known to have an oxygen metallicity ten times smaller than the Sun 
(explicitly, with 12 + log(0/H) < 7.65). XMP galaxies are rare; they repre- 
sent 0.01% of the galaxies with emission lines in SDSS/DR7. Although the final 
metallicity estimate of all candidates remains pending, strong-line empirical cali- 
brations indicate a metallicity about one-tenth solar, with the oxygen metallicity 
of the twenty one known targets being 12 + log(0/H) ~ 7.61 ± 0.19. Since the 
SDSS catalog is limited in apparent magnitude, we have been able to estimate 
the volume number density of XMP galaxies in the local universe, which turns 
out to be (1.32 ± 0.23) • 10"^ Mpc'^ The XMP galaxies constitute 0.1% of the 
galaxies in the local volume, or ~ 0.2% considering only emission line galaxies. 
All but four of our candidates are blue compact dwarf galaxies (BCDs), and 24 
of them have either cometary shape or are formed by chained knots. 

Subject headings: methods: data analysis - galaxies: abundances - galaxies: 
formation - galaxies: starburst - galaxies: statistics 
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Introduction 



The evolution of galaxies involves the birth and death of stars, therefore, galaxies un- 
avoidably produce metals as they live. Thus, galaxies with very low metallic content are 
probably unevolved objects, and if we find them nearby, they provide a readily accessible 
fossil record from the early universe. These objects are to be expected according to the 
paradigm of hierarchical galaxy formation, where large galaxies arise through the assem- 
bly of smaller one s in an inefficient process leaving rnany d warf galaxies as remnant (e.g., 
Klypin et al.lll999l : iBalogh et al.ll2001t iDiemand et al.l 120071 ). They seem to be materialized 
as the extremely metal-poor (XMP) dwarf galaxies observed today which, consequently, 
would be the closest examples we can find of these elementar y primordial units from which 
larger galaxies assembled (e.g., Ilzotov fc ThuanI l2004ai 120071 ). Those units must have been 



extremely common in the past, but they cannot be directly observed at high redshift. Nearby 
low metallicity galaxies offer a chance for detailed studies otherwise impossible. Studies of 
their interstellar medium (ISM) can shed light o n the properties of the primordial ISM at the 
time of galaxy formation ( Ilzotov &: Thuanll2007l ) . For example, even the most metal- deficient 
galaxies in the local universe formed from matter already enriched by an early star forma- 
tion episode, and the determination of the m inimum galactic nietallicity seems to be the bes t 
constraint available on these first stars (e.g.. lBromm fc Larsorul2004j : iThuan fc Izotovll2005l ). 
Because they have not undergone much chemical evolution, these galaxies are also the best 
objects for th e determination of the primordial He abundance that constrains cosrnologica l 



models (e.g., iPeimbert &: Torres-Peimbertlll974j : iPagel et al.lll992l : Ilzotov &: ThuanI l2004bl ). 



2000 



Amorm et al. 




g-, 


Kunth & Ostlin 



Papaderos et al.ll2008l ). It is so far unclear whether such preference for XMP galaxies 
to be BCDs is genuine or if the association results from an observational bias that systemati- 
cally disfavors low surface brightness objects. The best XMP candidates in the local universe 
are BCDs, but metal poo r galaxies are found among other types of dwarf galaxies as well 



see 



Kunth k Ostlinll2000h . 



Unfortunately, XMP galaxies are rare. The review by lKunth fc OstlinI (120001 ) cites only 
31 targets with metallicity below one tenth the solar value, which is the threshold custom- 
ary used to define XMP galaxies. For decades I Zw 18 held the record of lowest metallicity 
( Sargent &: Searlj 197ol) , and although a few other examples have been recently found (e.g.. 



Izotov et al. 



2005 



20091), there is a minimum metallicity close to that of I Zw 18, which 
corresponds to a few hundredths the solar value. The existence of such metallicity thresh- 
old is suggestive of the pre-galactic origin of metals as it happens with halo stars (e.g.. 



Spite fc Spitelll982l ). but it may also be due to other effects like the early self contamina- 
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tion of the ongoing sta rburst that rises any original level to a minimum metallicity (e.g 
Kunth fc Sargentlll9861). or even th e technical difficulty of metallicity determinations below 



a threshold (jPapaderos et al 



2008!) ■ y galaxies has significantly 

increased since the work by iKunth &: OstlinI (j2000[ ). but they are still rare objects. The 
thorough bibliographic compilation described in §|l]shows only 129 such targets. The short- 
age of low metallicity galaxies is partly a conseq uence of their low luminosity as expected 



from the luminosity- metallicity relationship (e.g.. iLequeux et al.lll979uSkillman et al.lll989 



Tremonti et al.ll2004j ). They must be faint and so detectable only within a very local volume. 



In order to enlarge the list of this rare yet interesting objects, we have carried out an 
automatic search for low metallicity galaxies in the seventh Sloan Digital Sky Survey data 
release (SDSS/DR7). The work is reported here. So far as we are aware of, this is the 
first systematic search of this kind on SDSS/ DRT, even though extensive searches in earlier 
SDSS data releases have been reported (e.g., Ilzotov et al.ll2006al : iGuseva et al.ll2009l ). The 
advantage of an orderly search rather than the more traditional serendipitous discovery is 
twofold. First, it maximizes the number of potential candidates. Second, the bias of the 
selection is quantifiable, allowing us to estimate for the ffist time the volume number density 
of XMP galaxies in the local universe. 

Low metallicity galaxies are characterized by having a [NII]A6583 line negligibly small as 
compared to Ha. Thus, the ratio between [NII]A6583 and Ha is used to measure met allicities 



through the appropriate calibration (e.g.. lDenicol6 et al.ll2002l : iPettini &: Pagelll2004l ). A low 



value of this ratio has been imposed as a necessary condition in classical works seeking for 
XMP galaxies (e.g.. Ilzotov et al.ll2006al ). Building on this classical approach, we address the 
problem in an original way by automatically classifying the galaxies according to the shape of 
their spectra in a region around Ha. We expected the classification to automatically separate 
classes of galaxies whose spectra present this property, and those targets would be regarded 
as metal poor candidates. Note that our approach does not require the detailed knowledge 
of the spectral properties of the XMP galaxies, e.g., we do not have to specify a particular 
ratio [NII]A6583 over Ha for a galaxy to be included. They are determined by the classi- 
fication algorithm in view of the existing spectra. This minimum need of prior knowledge 
makes the search novel and robust against uncertainties in the selection criteria. The above 
conjecture turned out to work, and the result of the study is presented here. We e mploy a 



robust classification algorithm called k-means, commonly used in data-mining (e.g.. lEveritt 



19951 ). and which we have already successfully applied to sort out different types of astronom- 



ical spectra spanning from polariza tion proffies in the Sun (ISanchez Almeida fc Lites 



Viticchie fc Sanchez Almeida! 1201 ll ) to galaxy spectra (ISanchez Almeida et al. 



20091. 



2000 



201 



The paper is organized as follows. First, we describe the systematic search for low 
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metallicity galaxies (§[2]). SDSS/DR7 renders 32 XMP candidates. The physical properties 
of the galaxies thus selected are analyzed in § |3l These candidates, together with the rest 
of SDSS/DR7 galaxies, allow us to compute the volume number density of low metallicity 
galaxies in the local universe (§ E])- In order to contextualize our work, we carried out a 
comprehensive search for XMP in the literature. The results are given in § HI A summary 
with conclusions and follow up work is provided in § [61 



2. K-means based search for galaxies of low metallicity 



As we mention in the introduction, the spectrum around Ha is very sensitive to metal- 
licity. The ratio between the equivalent w idths of [NII]A6583 an d Ho; goes to zero with de - 
creasing metallicity, as calibrated by, e.g.. iDenicolo et al.l (120021 ) or lPettini fc Pagell (120041 ). 
We make use of this sensitivity to select low metallicity galaxies classifying all the galaxies 
with spectra in SDSS/DR7 according to their shape around Ha. Those classes containing 
spectra where [NII]A6583 turns out to be negligible small with respect to Ha will be saved 



as cand i dates. For detailed info rmation on the SDSS spectral catalog, see IStoughton et al. 



(l2002al ). lAbazajian et al.l (120091 ). as well as the comprehensive SDSS/DR7 web sitqj. The 
main properties of the catalog affecting our analysis are the spectral resolution of the spectra, 
some 2000 at Ha, and the fact that it includes all galaxies up to an integrated r magnitude 
of 17.8. 

Since the classification must be based on the shape of the spectrum rather that on other 
incidental property (e.g., the galaxy luminosity), the spectra must go through a previous 
normalization procedure. The original spectra are shifted to restframe wavelength using 
SDSS redshifts, and then normalized to the flux in the g color filter. In addition, the 
continuum around Ha is subtracted, and this spectrum is re-normalized to the peak intensity 
of [NII]A6583. Specifically, if /(A) stands for the spectrum in restframe wavelengths and 
normalized to then the spectrum to be classified is S'(A), 



S{\) 



/(A) - /e(A) 



|/(6583A) -/c(6583A)f 



where A stands for the wavelength, and 7(6583 A) is the intensity at the extreme of [Nil] A6583, 
i.e., the maximum if the line is in emission and the minimum if it is in absorption. The contin- 
uum intensity Ic{^) is derived by linear interpolation of the spectrum in two clean windows 
at both sides of the spectral region of interest, namely, from 6400 A to 6460 A, and from 



http : //www ■ sdss ■ org/dr7| 
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6610 A to 6670 A. Note that the denominator of equation ([T]) is always positive so that the 
hnes in absorption remain in absorption after normahzation. We came across this normal- 
ization after some alternative trials. The normalization to [NII]A6583 turns out to maximize 
the contrast since it goes to zero for the low metallicity galaxies we are trying to identify. 
Figure [1] summarizes the normalization procedure. It includes the original spectrum (the 
black solid line) as well as the two continuum windows used to determine the continuum 
intensity. After normalization (i.e., after continuum removal plus division by the maximum 
intensity at [NII]A6583) it becomes the dashed line, which is the 80 A wide spectral range 
undergoing classification. 

We employ the algorithm k-means for classification. It treats the spectra as vectors in a 
n-dimensional space, with n the number of wavelengths. It is a rather standa r d technique i n 



data- mining, machine learning, and artificial intelligence (e.g., lEverittlll995l : iBishopI l2006l ) 



and we have already successfully e mployed it for massive classification of galaxy spectra 



( ISanchez Almeida et al.l l2008l l2010l ). It works iteratively, starting from randomly chosen 
cluster centers. Each galaxy spectrum is assigned to the cluster center that is closest in a 
least-squares sense. Then the cluster centers are recomputed as the average spectrum of 
all the spectra in the class, and the assignation begins again. The algorithm ends when 
no spectra are re-classified in two successive iterations. The main advantages are: (1) it is 
fast, simple, and robust, as requited to classify large data sets, (2) it guarantees that similar 
spectra end up in the same class, (3) it automatically yields the number of classes, and (4) 
it provides spectra that are characteristic of the classes. These template spectra are just the 
average of all the spectra in the class, and they can be analyzed and interpreted as individual 
galax ies under the same assumptions followed when applying the popular stacking technique 



(e.g., lEUison et al.l 120001 ). The drawbacks are: (1) the starting random seeds influence the 
classification, and such effect must be followed up and, eventually, corrected for, and (2) the 
classes do not necessarily represent individual clusters in the classification space, but may be 
parts of clusters. The latter downside is not a problem in our particular application since we 
are not interested in clustering but in separating spectra with different shapes. As for the 
former, one have to carry out several independent trials to test the robustness of the classes 
on the initialization. 

We apply k-means to the galaxies with spectra having redshift < 0.25, which is equiva- 
lent to classifying all the ~ 9-10^ galaxies with spectra in SDSS/DR7 since the low metallicity 
targets are expected to be faint (§ [1]) and, consequently they cannot be observed at high 
redshift given the apparent magnitude threshold imposed by SDSS. (For a galaxy at redshift 
> 0.25 to be observed spectroscopically, the absolute r magnitude must be smaller than 
—22.3.) The k-means procedure is applied in two successive steps, because a single applica- 
tion renders classes too coarse to separate low metallicity galaxies. In the first application. 
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we remove classes having Ha in absorption, as well as those whose [NII]A6583 is much too 
large. Actually the hmit is not imposed directly on [NI I]A6583, but on the equivalent oxygen 



abundance deduced via the so-called N2- method (e.g., iDenicolo et al.ll2002t IShi et al.ll2005[ ) 
which depends, exclusively, on the ratio between the equivalent widths of [NII]A6583 and 
Ha, 

N2=log(iy[NII]A6583/W^Ha). (2) 



In the calibration by lPettini &: Pagell ( 120041 ). the oxygen abundance is, 

12 + log (0/H) ~ 8.90 + 0.57 N2, 



or alternatively. 



12 + log (0/H) ~ 9.37 + 2.03 N2 + 1.26 m'^ + 0.32 N2^ 



(3) 



(4) 



Galaxies belonging to classes whose template spectrum has 12+log (0/H) < 8.2 were used for 
a second k-means run. This arbitrary threshold was chosen as a trade off that removes enough 
high metallicity objects, yet allows the second classification to choose from a broad enough 
pool. The subset undergoing the second classification contains some 5000 galaxies, which 
correspond to only 0.6% of the original set. This second classification renders classes with a 
spectrum characteristic of low metallicity, i.e., with Ha ^ [NII]A6583. Several templates of 
the classes thus obtained are included in Fig. HJ It shows genuine low metallicity classes (e.g., 
class # 14) as well as some others where the SDSS skyline removal pipeline has artificially 
cut out [NII]A6583 (e.g., class #15). The finding of these fake low metallicity galaxies 
shows a weakness of the searching technique - some SDSS/DR7 galaxies with negligible 
[NII]A6583 may not be low metallicity after all, and the selection must be refined even 
further. However, this downside incidentally proves the procedure to be working properly, 
since we know it managed to identify spectra without [NII]A6583. The problem arises as to 
how to separate genuine from fake low metallicity galaxies. Fortunately, it can be sorted 
out easily as we explain below. Figure E] shows the oxygen abundance corresponding to the 
templates of the classes inferred from this second step, discarding those classes where the 
template shows obvious malfunctioning of the SDSS pipeline (i.e., like class # 1 5 in F ig. |2]). 
They are based on the two slightly different calibrations by iPettini &: Pagell (l2004j ) given 
in equations ([3]) and @. We use it to select the galaxies in classes # 11 and # 14 as 
XMP galaxy candidates. The selection criteria are somewhat arbitrary but we choose these 
two classes because (1) the metallicity estimated for the template using the N2-method are 
smaller than one-tenth solai@, which is the reference value for XMP galaxies (§[!]), (2) there 



^Or VFHft/y [NTT]xfiR«.s > 14 using the calib r ation in equation (jj]) and assuming 12 + log(O/H)0 — 8.65 
lAsplundlbooil) . as invoked bv lPettini fc Pagell |2004l ). 
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Fig. 1. — Spectral region classified to search for metal poor galaxies, which should have 
[NII]A6583 small compared to the nearby Ha line. The original spectrum (the black solid 
line) is normalized before classification rendering the (red) dashed line. The normalization 
includes removing the continuum, and dividing the residual by the maximum intensity of 
[NII]A6583. The blue solid bars at the extremes mark the intervals used to determine the 
continuum. The central 80 A wide bar embraces the spectral region of interest. Wavelengths 
A are in A, and the spectra are given in arbitrary units. 




Fig. 2. — Several templates of the k-means classification characterized by having Ha ^ 
[Nn]A6583, which may correspond to classes of low metallicity galaxies. The individual 
spectra have their continua removed, and have been normalized to the peak intensity of 
[NH]A6583. In order to be plotted in a logarithm scale, the spectra have been artificially 
uplift by one unit. These classes include spectra where the SDSS pipeline has artificially 
removed [NH]A6583 creating a mock low metallicity galaxy (e.g., class 7^ 15 is mostly formed 
by these spectra). Wavelengths are given in A. 
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seems to be a gap in metallicity between these two classes and the rest (see Fig. [3]), and (3) 
the number of galaxies the classes contain is small enough to allow a detailed inspection of 
the individual spectra. The original 49 spectra included in these two classes were visually 
inspected to discover that 20 of them were faults produced by the SDSS pipeline. These 
spectra are easily recognized because the pipeline linearly interpolates the spectrum around 
[NII]A6583, and the presence of such a straight line boldly contrasts with the rest of the 
spectrum shape. They were excluded. Since faulty spectra ended up in low-metal classes, 
we checked the faulty classes for truly XMP spectra sneaked in. Most of the spectra were 
indeed failures from the pipeline, but we rescued three particularly low [NII]A6583 spectra 
included in these classes. All in all, our selection rendered 32 galaxies. Their coordinates 
and the unique identifier of the galaxy spectrum in SDSS/DR7 are listed in Tabled] The 
table also includes a column with the oxygen metallicity inferred applying equation (jl]) to the 
equivalent widths measured on the individual SDSS spectra. These values are only tentative 
since the prescription has an intrinsic scatter as large as 0.3 dex for in dividual galaxies (e.g.. 



Pettini fc Pagel 



2004J : IShi et all 120051 : IPerez-Montero k Continil 120091 1. Keeping this caveat 



in mind, the mean 12 + log(0/H) of the candidates is 7.60, with a standard deviation of 
0.22. As we will see in § IH this estimate is in agreement with the metallicity of the subset 
of candidates whose abundance has been derived with more precise means. 

K-means renders final classes that depend on the random initialization. This problem 
is not particularly severe in our application, since we are not trying to find clusters. We just 
try to separate spectra with a particular shape, independently on whether they appear on a 
single class or in several classes. Therefore, we did not expected the random initialization to 
represent a serious problem, yet we study the impact on the selected spectra by repeating the 
k-means classification 5 times. Classes with one-tenth solar metallicity are always present. 
They contain most of the 49 spectra in the original classification purged later on. Specifically, 
the two classes of lowest metallicity always include between 79% and 93% of the spectra in 
the classification. 

Our search is systematic. If SDSS/DR7 has spectra with Ha ^ [NII]A6583, they will 
appear in one of the classes. The assumption that we detect all the spectra with this 
property is used in §[n]to estimate the number density of XMP galaxies in the local universe. 
Deviations from this assumption are also analyzed in § O 



3. Properties of the XMP candidates 



As pointed out in § [H the known XMP galaxies tend to be BCD galaxies, which are 
characterized by blue colors, high surface brightness, and low luminosity. Obviously, the 
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Table 1. XMP candidates found by classifying all SDSS/DR7 galaxies according to their 

spectra around Ha. 



Index^ 




Name 




g 




Rcdshift 


12 + 
d 


log{0/H) 

c 


SpccObjID'" 


Comment*^ 


1 


SDSS 


J003630, 


.40+005234.6 


18, 


.8 


0.028 


7.64 




- 


194442194734546944 


single knot 


2* 


SDSS 


J012534, 


19- 


h075924.4 


18, 


.4 


0.010 


7.60 


7, 


,60 


655785970125242368 


knotted cometary 


3 


SDSS 


J015809, 


,39- 


1-000637.2 


18, 


.1 


0.012 


7.75 




- 


302812946306695168 


cometary 


4* 


SDSS 


J030331, 


,27-010947.1 


19, 


.7 


0.030 


7.48 


7, 


.22 


225967511814799360 


cometary 


5 


SDSS 


J031300. 


,05- 


h000612.1 


19, 


,2 


0.029 


7.82 




- 


299996706848636928 


cometary 


6* 


SDSS 


J080840, 


,85- 


hl72856.4 


19, 


,2 


0.044 


7.36 


7. 


,48 


585978593608204288 


single knot 


7 


SDSS 


J082540, 


,45- 


hl84617.2 


19, 


.0 


0.038 


7.75 




- 


640023302407979008 


single knot 


8 


SDSS 


J084236, 


,58- 


hl03313.9 


17 


,7 


0.011 


7.58 




- 


724467307371298816 


cometary 


9* 


SDSS 


J093402, 


,03- 


h551427.7 


16, 


,4 


0.002 


6.88 


7, 


17 


156443095175528448 


2-knot cometary 


10 


SDSS 


J094254, 


,27- 


h340411.8 


19, 


.1 


0.023 


7.67 




- 


547698126572486656 


cometary? 


11 


SDSS 


T 1 nriQ A c 


,66- 


h450457.7 


17 


,5 


0.009 


7.65 






265375788289753088 


single knot 


12* 


SDSS 


J101624. 


,52- 


h375445.9 


15, 


,9 


0.004 


7.61 


7. 


,58 


401892408779866112 


cometary 


13* 


SDSS 


J103137, 


,28- 


h043422.0 


16, 


.2 


0.004 


7.52 


7, 


,70 


162635977557278720 


cometary 


14* 


SDSS 


J104457, 


,80- 


h035313.1 


17, 


.5 


0.013 


7.01 


7, 


,44 


162917331083722752 


cometary 


15* 


SDSS 


J111934, 


,34+513012.1 


16, 


,8 


0.004 


7.75 


7. 


,51 


247359886311030784 


2-knot cometary 


16 


SDSS 


J114506, 


,25- 


h501802.4 


17 


.8 


0.006 


7.71 






272412374642720768 


cometary 


17* 


SDSS 


J115132, 


,94-022222.0 


16, 


.7 


0.004 


7.58 


7, 


,78 


93111671593107456 


2-knot cometary 


18 


SDSS 


J115754. 


18- 


h563816.7 


16, 


,9 


0.001 


7.83 






369803376499621888 


cometary 


19* 


SDSS 


J120122, 


,31- 


h021108.3 


17 


.6 


0.003 


7.69 


7, 


,49 


145464500043120640 


cometary 


20* 


SDSS 


J121402, 


,48- 


h534517.4 


17 


.3 


0.003 


7.54 


7, 


,64 


287049377199947776 


cometary 


21* 


SDSS 


J123048, 


,60- 


hl20242.8 


16, 


,7 


0.004 


7.78 


7. 


,73 


454810434122285056 


cometary 


22* 


SDSS 


J125526, 


,07-021334.0 


19, 


,1 


0.052 


7.73 


7. 


,83 


95080395053203456 


single knot 


23* 


SDSS 


J132347, 


,46-013252.0 


18, 


.1 


0.023 


7.21 


7, 


,78 


96204976459612160 


single knot 


24 


SDSS 


J132723, 


,29- 


h402204.1 


19, 


,0 


0.012 


7.67 






412307391283986432 


single knot 


25* 


SDSS 


J133126, 


,91+415148.3 


17 


,1 


0.012 


7.64 


7. 


,75 


412307391565004800 


cometary? 


26* 


SDSS 


J135525, 


,66- 


h465151.3 


19, 


.2 


0.028 


7.86 


7, 


,63 


361921789577658368 


cometary? 


27 


SDSS 


J141851, 


13- 


h210239.7 


17 


,6 


0.009 


7.64 






784423532984532992 


cometary 


28* 


SDSS 


J142342, 


,88- 


h225728.7 


17 


,8 


0.033 


7.67 


7. 


,72 


600334402043510784 


single knot 


29* 


SDSS 


J150934, 


17- 


h373146.1 


17 


.3 


0.033 


7.74 


7, 


,85 


394011865673367552 


cometary? 


30* 


SDSS 


J164710, 


,66- 


h210514.5 


17 


.2 


0.009 


7.74 


7, 


,75 


442143986740625408 


knotted cometary 


31* 


SDSS 


J223831. 


12- 


hl40029.7 


18, 


,9 


0.021 


7.64 


7. 


,45 


533060792673632256 


2-knot cometary 


32* 


SDSS 


J230210, 


,00- 


h004938.8 


18, 


.7 


0.033 


7.57 


7, 


,71 


190784502518251520 


2-knot cometary 



^Those indexes marked with an asterisk correspond to known XMP galaxies. The rest are new. 
'^Unique identifier of the galaxy spectrum in SDSS/DR7. 
'^Sketch of galaxy shape. 

Using the calibration bv lPettini fc Pagell l l2004h . given in our equation (|4]l. 
''Prom the literature as listed in Table [2] 
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metal-poor galaxies represent only a small fraction of the BCDs. In order to explore the 
physical properties of our candidates, we use the diagnostics developed to identify BCDs 
in large galaxy samples, to see whether our candid a tes co nform to the properties of known 
XMP galaxies. Follow ing ISanchez Almeida et al.l ( 120081 ). which condense the criteria by 
Gil de Paz et al.l J2003h and iMalmbeTgl fcoosh . BCDs can be observationally characterized 
by, 



(a) fig — fir < 0.43 mag arcsec ^, 

(b) fig < 21.83 — 0.47(/ig — fir) mag arcsec"^, 

(c) Mg > -19.12 + 1.72{Mg - Mr) mag, 

(d) Wua > 50 A, 

(e) 12 + log(0/H) < 8.2 1/3 0), 

(f ) no AGN, 

(g) no bright galaxy within 10 R^q. 



The various new symbols have their usual meaning - ftg and fir stand for the mean surface 
brightness in the SDSS filters g and r, Mg and Mr represent the absolute magnitudes in these 
two filters, and R50 is the radius containing 50% of the galaxy light. The criteria above assure 
that (a) BCDs are blue, (b) they have high surface brightness, (c) they are dwarf, (d) they 
have a large star-formation rate, (e) they are metal poor, (f ) they are not confused with active 
galactic nuclei (AGN), and (g) they have no close companions. Figure H] shows various scatter 
plots relating the physical parameters involved in BCD characterization. The required colors 
and sizes have been taken directly from the SDS S/DR7 database using Petrosian magnitudes 
(IStoughton et al.l l2002at lAbazajian et al.ll2009l ). The color-magnitude plot. Fig. |Dd, shows 
that all our XMP candidates are dwarf - the dashed line represents constraint (c). Their 
Ha equivalent width exceeds the requir ed threshold (constr aint d), and the galaxies reside 
in a region of the BPT diagram (after [Baldwin et al.l Il98ll ) discarding the AGN nature - 
the dashed line in Fig. [Hi divide AGN activity and star-forming activity as worked out by 
Kauffmann et al.l (120031 ). The XMP candidates obviously fulfill criterion (e) (see Fig. [3]), and 
they are also blue-enough, staying below the horizontal dashed line in Fig. corresponding 
to criterion (a). As far as the compactness criterion (b) is concerned, most galaxies comply 
with it, being to the left of the slanted dashed line in Fig. H^. Four of them do not - they 
are represented as asterisks in Fig. H^, as well as in the remaining panels of the figure. The 
low surface brightness of these galaxies is to some extent apparent. Visual inspection of 
their SDSS composite-color images show two of them to present a marked cometary shape 
or double nucleus, and the SDSS reduction pipeline has probably overestimated the apparent 
size of those galaxies. 
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Spurred by the unusual shape of these non-BCD galaxies, we carry out an eyeball 
inspection of all the 32 targets. Many of them turned out to have distorted morphologies, 
often with cometary shape, knots and/or chains of knots. Only 8 appear as a single knot 
without obvious substructure. The results of this crude morphological classification are given 
in Table [H Figure [5] includes SDSS mugshots of all XMP candidates not marked as single 
knot in Table. [1] Those galaxies tagged with a question mark in Tabled] represent ambiguous 
cases where the cometary shape is less obvious, however, even in theses cases the images show 
elongated asymmetrical substructure (e.g., # 29). 

The color magnitude diagram in Fig. |Dd deserves a separate comment. First, it shows 
how several XMP candidates are extremely blue, reaching up to (? — r < — 1. These extreme 
color s are rare, but they have been quoted in connection with luminous star-forming galaxie s 
(e.g.. Ilzotov et al.ll201l[ ). and with blue compact dwarfs (e.g.. ISanchez Almeida et al.ll2008l ). 



Second, the figure sugge sts a trend that contrasts wit h usual behavior where brighter galaxies 



tend to be redder (e.g., iBlanton fc Moustakasll2009l ) - the brighter the XMP candidate the 



bluer. This unusual trend cannot be ascribed to photometric errors (the error bars provided 
by SDSS are included in Fig. |Dd). The behavior remains independently of the type of 
magnitude (Fig. |Dd u ses Petrosian magnitud es, but we also tried the other magnitudes 
provided by SDSS; see IStoughton et al.ll2002bl ). Moreover, the trend disappears when colors 
other than g — r are used. This unusual trend calls for an explanation but, so far, we can only 
offer conjectures. It may be a random fluctuation, since only a handful of galaxies define the 
trend. Alternatively, it may be due to a subtle effect on the integrated colors of the large 
emission lines that dominate the spectrum of these galaxies. 

In short, all but four of the 32 candidates fulfill the criteria to be BCDs, and 24 of them 
show cometary or knotted shape. The fact that the XMP candidates have these properties 
is by no means trivial. We have selected our sample according to the form of their spectra 
in an narrow spectral window around Ha, and this narrow bit of spectrum turns out to 
determine many global properties of the galaxy such as color, compactness, star formation 
rate, and even morphology. 



4. Search for XMP galaxies in the literature 
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Table 2. XMP targets found in the literature. 



Name^ 



RA 



DEC 



SpccObjIDb 



12 + log{0/H) 



Comment'^ 



UGC 12894 


00 


00 


22 


-1-39 29 44 








7, 


,64 


J0004+0025 


00 


04 


22 


-1-00 25 36 


19, 


,4 


433417380224303104 


7, 


,37 


J0014-0044 


00 


14 


29 


-00 44 44 


18, 


,7 


306753510077104128 


7, 


,63 


J0015+0104 


00 


15 


21 


-fOl 04 37 


18, 


,3 


433983718529433600 


7, 


,07 


J0016+0108 


00 


16 


28 


-fOl 08 02 


18, 


,9 


193598873954942976 


7, 


,53 


HS 0017+1055 


00 


20 


21 


-f 11 12 21 








7, 


,63 


J0029-0108 


00 


29 


05 


-01 08 26 


19, 


,2 


434546692474273792 


7, 


,35 


J0029-0025 


00 


29 


49 


-00 25 40 


20, 


,4 


434546692394582016 


7, 


,29 


ESQ 473- G024 


00 


31 


22 


-22 45 57 








7, 


,45 


Andromeda IV 


00 


42 


32 


-1-40 34 19 








7, 


,49 


J0057-0022 


00 


57 


13 


-00 21 58 


19, 


,1 


305062972235972608 


7, 


,60 


IC 1613 


01 


04 


48 


-1-02 07 04 








7, 


,64 


J0107+0001 


01 


07 


51 


-1-00 01 28 


19, 


,4 


422158628955357184 


7, 


,23 


AM 0106-382 


01 


08 


22 


-38 12 34 








7, 


,56 


J0113-I-0052 


01 


13 


40 


-1-00 52 39 


20, 


,1 


422158629852938240 


7, 


,24 


J0119-0935 


01 


19 


14 


-09 35 46 


19, 


,5 


185997585616470016 


7, 


,31 


HS 0122-1-0743* 


01 


25 


34 


-1-07 59 24 


15, 


,7 


655785970125242368 


7, 


,60 


J0126-0038 


01 


26 


46 


-00 38 39 


18, 


,4 


422724752210132992 


7, 


,51 


J0133-I-1342 


01 


33 


53 


-1-13 42 09 


18, 


,1 


120131172889001984 


7, 


,56 


J0135-0023 


01 


35 


44 


-00 23 17 


18, 


,9 


303656125474013184 


7, 


,38 


UGCA20 


01 


43 


15 


-1-19 58 32 


18, 


,0 




7, 


,60 


UM133 


01 


44 


42 


-1-04 53 42 


15, 


,4 




7, 


,63 


HKK97L14 


02 


00 


10 


-1-28 49 53 








7, 


,56 


J0204-1009 


02 


04 


26 


-10 09 35 


17, 


,1 


187686313107914752 


7, 


,36 


J0205-0949 


02 


05 


49 


-09 49 18 


15, 


,3 


187967849057222656 


7, 


,61 


J0216-I-0115 


02 


16 


29 


-1-01 15 21 


17, 


,4 


424413702440091648 


7, 


,44 


096632 


02 


51 


47 


-30 06 32 


16, 


,3 




7, 


,51 


J0254-I-0035 


02 


54 


29 


-f 00 35 50 


19, 


,8 


425817950054776832 


7, 


,28 


J0301-0059 


03 


01 


26 


-00 59 26 


21, 


,5 


300559785177120768 


7, 


,64 


J0301-0052 


03 


01 


49 


-00 52 57 


18, 


,8 


300559785005154304 


7, 


,52 


J0303-0109* 


03 


03 


31 


-01 09 47 


19, 


,8 


225967511814799360 


7, 


,22 


J0313-I-0010 


03 


13 


02 


-1-00 10 40 


18, 


,9 


226248880952442880 


7, 


,44 


J0315-0024 


03 


16 


00 


-00 24 26 


20, 


,2 


426661932058017792 


7, 


,41 


UGC2684 


03 


20 


24 


-1-17 17 45 


22, 


,8 




7, 


,60 


SBS0335-052W 


03 


37 


38 


-05 02 37 


19, 


,0 




7, 


11 


SBS0335-052E 


03 


37 


44 


-05 02 40 


16, 


,3 




7, 


,31 


J0338-I-0013 


03 


38 


12 


-1-00 13 13 


24, 


,4 


227094714526990336 


7, 


,64 


J0341-0026 


03 


41 


18 


-00 26 28 


18, 


,8 


325892648806121472 


7, 


,26 


ESQ 358- G 060 


03 


45 


12 


-35 34 15 








7, 


,26 


G0405-3648 


04 


05 


19 


-36 48 49 








7, 


,25 


J0519-I-0007 


05 


19 


03 


-1-00 07 29 


18, 


,4 




7, 


,44 


To0618-402 


06 


20 


02 


-40 18 09 








7, 


,56 


ES0489-G56 


06 


26 


17 


-26 15 56 


15, 


,6 




7, 


,49 


J0808-I-1728* 


08 


08 


41 


-f 17 28 56 


19, 


,2 


585978593608204288 


7, 


,48 


J0812-I-4836 


08 


12 


39 


-1-48 36 46 


16, 


,0 


124071834843873280 


7, 


,28 


UGC 4305 


08 


19 


05 


-1-70 43 12 








7. 


,65 



van Zee < 



Guseva et al. (2009) 
Guseva et al. (2009) 
Guseva et al. (2009) 
Guseva et al. (2009) 
Fustilnik & Martin (2007) 
Guseva et al. (2009) 
Guseva et al. (2009) 
Skillman et al. (2003) 
Pustilnik et al. (2008) 
Guseva et al. (2009) 
Nagao et al. (2006_) 
Guseva et al. (2009) 
Lee et al. (2003) 
Izotov fc Thuan (2007) 



Ekta fc Chengalur (2010b) 
Izotov fc Thuan (2004b) 

Guseva et al. (2009) 
Papa dcros ct al. (2008) 
Guseva c t al. (2009) 
KO, van Zee . 



KO, Kniazev et al. (2001) 
van Zee fc Havnes (2006) 
Izotov fc Thuan (2007) 
JCniaze^_e^^l^ (2003 ) 
Guseva et al. (2009) 
Guseva et al. (2009) 
Izotov fc Thuan (2007) 

Guseva et al. (2009) 
Izotov fc Thuan (2007) 

Guseva e t al. (2009) 
Izotov fc T huan (2007) 
Guseva et al. (2009) 
KO, van Zee fc Havnes (2006) 
KO, Thuan fc Izotov (2005) 
KO, Thuan fc Izotov (2005) 
Guseva et al. (2009) 
Guseva et al. (2009) 

Lee et al. (2003) 
Guseva et al. (2009) 
Guseva et al. (2009) 
KO, Masegosa et al. (1994 ) 
KO, Roennback & Bergvall (1995) 
Brown ct al. (2008) 
Izotov fc Thuan (2007) 
Nagao et al. (2006) 
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Fig. 3. — Oxygen abundance of the various classes as inferred using the N2 method on 
the template spectra corresponding to each one the classes. The two typ es of symbols 
correspond to the two slightly different calibrations in iPettini fc Pagell (120041 ) - asterisks for 
equation ([3]) and box es for equation (jl]). The horizontal dashed line represents one-tenth of 
the solar metallicity (lAsplundl l2005l ). 
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Fig. 4. — Diagnostic plots to characterize the XMP galaxies resulting from the search, (a) 
Color fig — fir vs surface brightness in the g filter fig. Only 4 out of the 32 galaxies have a 
surface brightness beyond the slanted line that bounds BCD galaxies. These four outliers are 
marked with times symbols (in red) in all the plots, (b) Color Mg — Mr vs absolute magnitude 
Mg in the g filter. All the galaxies are consistent with being BCDs. The error bars have 
been taken directly from SDSS. (c) Ha equivalent width vs absolute magnitudes. The star- 
formation rate inferred from these equivalent widths exceeds in all cases the BCD threshold 
(the dashed line), (d) BPT diagram showing all the targets to be starforming galaxies as 
opposed to AGNs (separated by the dashed curve worked out by lKauffmann et al.ll2003l ). 
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SDSS J012534.19+075924.4 -■- SDSS J015809. 39 + 000637. 2 



5" (8) 
iP 

SDSSJ084236,58+103313.9 



(13) 

5" 

SDSS J10313V. 27+043422.0 




(14) 



SDSS J030331. 27-010947.1 



SDSS J094254. 27+340411. 8 




(15) I 5" 



SDSSJ104457,79+035313.1 T SDSS J111934.36 + 513012.1 f- SDSS J114506.26+501802.3 



Fig. 5. — SDSS mugshots of all XMP candidates not marked as single knot in Table. [T] 
They are cometary (e.g., # 5), knotted-cometary (e.g., ^ 2 ), or doubtful (e.g., 7^ 13). The 
numbers correspond to the index in Table. [H whereas the scales on the top left corner of the 
panels represent 2, 5 or 10 arcsec, as indicated by the insets. The figure continues in Fig. El 




Fig. 5. — Continuation of Fig. [51 
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Table 2 — Continued 



Name'' 




RA 


DEC 


g 




SpecObjlD*" 


12 + 1oe(0/H) 


Comment^ 


HS0822+03542 


08 


25 


55 


4-35 32 31 


17, 


,8 




7.35 


KO, Kniazev et al. ('2000') 


DD053 


08 




U t 


+66 10 54 


20. 


,3 




7.62 


KO. van Zee et al. (2006) 


UGC4483 


08 


37 


03 


+69 46 31 


15, 


,1 




7.58 


KO. Izotov & Thuan C2002) 


HS0837+4717 


08 


40 


30 


+47 07 10 


17, 


,6 




7.64 


Pustilnik et al. f2004) 


HS 0846+3522 


08 


49 


40 


+35 11 39 


18, 


,2 




7.65 


Pustilnik & Martin (2007) 


J0859+3923 


08 


59 


47 


+39 23 06 


17, 


,2 




7.57 


Izotov & Thuan (2007) 


J0910+0711 


09 


10 


29 


+07 11 18 


16, 


,9 


336307481519063040 


7.63 


Ekta & Chengalur ('2010b) 


J0911+3135 


09 


11 


59 


+31 35 36 


17, 


,8 


448054218540974080 


7.51 


Izotov & Thuan (2007) 


J0926+3343 


09 


26 


09 


+33 43 04 


17, 


,8 


448617233443192832 


7.12 


Pustilnik et al. (2010) 


IZwl8* 


09 


34 


02 


+55 14 25 


16, 


,4 


156443095175528448 


7.17 


KO, Thuan & Izotov (2005) 


J0940+2935 


09 


40 


13 


+29 35 30 


16, 


,5 


546853820680896512 


7.65 


Izotov & Thuan (2007) 


CGCG 007-025 


09 


44 


02 


-00 38 32 


16, 


,0 


75094093385957376 


7.65 


Guseva et al. (2007) 


SBS940+544 


09 


44 


17 


+54 11 34 


19, 


,1 




7.46 


KO. Guseva et al. (2001) 


CS 0953-174 


09 


55 


00 


-17 00 00 








7.58 


KO, Masegosa et al. ( 1994) 


J0956-I-2849 


09 


56 


46 


+28 49 44 


15, 


,9 


548261264221011968 


7.13 


Izotov & Thuan (2007) 


LcoA 


09 


59 


26 


-1-30 44 47 


19, 


,0 




7.30 


KO, van Zee & Havnes (2006) 


Sextans B 


10 


00 


00 


+05 19 56 


20, 


,5 




7.53 


Lee et al. (2006) 


Sextans A 


10 


11 


00 


-04 41 34 








7.54 


KO, Kniazev et al. (2005) 


KUG 1013-f381* 


10 


16 


24 


+37 54 44 


15, 


,9 


401892408779866112 


7.58 


KO, Kniazev & Pustil'Nik (1998) 



Izotov et al. (2007a) 
Cairos et al. (2010) 
Izotov et al. ( 2007a) 
Pustilnik fc Martin (2007) 

Papaderos et al. (2008) 
Pustilnik fc Martin (2007) 

Kniazev ct al. (2003) 
KO, Kniazev ct al. (^03) 
Kniazev et al. (2003) 
Nagao et al. (2006) 
Guseva et al. (2003c ) 
^agao et a^ (2006) 
Papa deros c t al. (2008 ) 
KO. Nava et al. (2006) 
Nagao et al. (2006) 
Kniazev et al. (2003) 
KO, Izotov et al. (2004) 



VCC 0428 


12 


20 


40 


+ 13 53 22 


17.0 


497314452257898496 


7.64 


Vflchez & Iglesias-Paramo (2003) 


HS 1222+3741 


12 


24 


37 


+37 24 37 


17.9 


564023911881113600 


7.64 


Popescu & Hopp (2000)* 


Tol65 


12 


25 


47 


-36 14 01 


17.5 




7.54 


KO, Izotov et al. (2004) 


J1230+1202* 


12 


30 


49 


+ 12 02 43 


16.7 


454810434122285056 


7.73 


KO, Pustilnik et al. (2002) 


KISSR 85 


12 


37 


18 


+29 14 55 


19.9 




7.61 


Lee et al. (2004) 


UGCA 292 


12 


38 


40 


+32 46 01 


18.9 




7.28 


Pilvugin (2001) 


HS 1236+3937 


12 


39 


20 


+39 21 05 


18.5 




7.47 


Popescu & Hopp (2000)* 


J1239+1456 


12 


39 


45 


+ 14 56 13 


19.8 




7.65 


Brown et al. (2008) 


SBS 1249+493 


12 


51 


52 


+49 03 28 


18.0 




7.64 


Nava ct al. (2006) 


J1255-0213* 


12 


55 


26 


-02 13 34 


19.1 


95080395053203456 


7.83 


~Yin et al. (20071 



SDSS J1025+1402 


10 


25 


30 


+ 14 02 07 


20, 


,4 


491964741116231680 


7. 


36 


UGCA 211 


10 


27 


02 


+56 16 14 


16, 


,2 




7. 


,56 


J1031+0434* 


10 


31 


37 


+04 34 22 


16, 


,2 


162635977557278720 


7, 


,70 


HS 1033+4757 


10 


36 


25 


+47 41 52 


17, 


,5 


271004930616066048 


7, 


,65 


J1044+0353* 


10 


44 


58 


+03 53 13 


17, 


,5 


162917331083722752 


7, 


,44 


HS 1059+3934 


11 


02 


10 


+39 18 45 


17, 


,9 




7, 


,62 


J1105+6022 


11 


05 


54 


+60 22 29 


16, 


,4 


218086199200317440 


7. 


,64 


J1119+5130* 


11 


19 


34 


+51 30 12 


16, 


,9 


247359886311030784 


7. 


,51 


J1121+0324 


11 


21 


53 


+03 24 21 


18, 


,1 


235538034580258816 


7. 


,64 


UGC 6456 


11 


28 


00 


+78 59 39 








7. 


,35 


SBS1129+576 


11 


32 


02 


+57 22 46 


16, 


,7 




7. 


36 


J1151-0222* 


11 


51 


32 


-02 22 22 


16, 


,8 


93111671593107456 


7, 


,78 


J1201+0211* 


12 


01 


22 


+02 11 08 


17, 


,6 


145464500043120640 


7, 


,49 


SBS1159+545 


12 


02 


02 


+54 15 50 


18, 


,7 




7, 


,41 


SBS 1211+540* 


12 


14 


02 


+53 45 17 


17, 


,4 


287049377199947776 


7, 


,64 


J1215+5223 


12 


15 


47 


+52 23 14 


15, 


,2 


248767590061572096 


7, 


,43 


Toll214-277 


12 


17 


17 


-28 02 33 








7. 


,55 
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In order to put our search into context, we carried out a careful revision of the ht- 
erature to select all the nearby galaxies with metallicity reported to be one-tenth solar or 
less. (12 + log(0/H) < 7.65). Those are listed in Table [2J We start off from Table 3 in 
Kunth &: OstlinI (j2000[ ) . who compile all the galaxies found in the literature prior to year 2000. 
It comprises 31 targets, from which 5 were discarded because their metallicity has been 
revised upwards. Then we cover the last ten ye ars by checking cro ss-references existing in 
recent known papers. We begin with the work by lGuseva et al.l (120091 ). that analyzes 44 tar- 
gets selected in SDSS/DR6. The spe ctra used in the ana lysis are not spectra from SDSS but 
obtained elsewhere. We choose from lGuseva et al.l ( 120091 ) all the galaxies with oxygen abun- 



dance below the threshold . Then, using ADq^, w e revised all papers cited in iGuseva et al. 



( 120091 ) and published after iKunth fc OstlinI ( 120001 ). Based on their title and abstract, we ex- 
amined those papers dealing with galaxy metallicity, separ ating the appropr iate XMP galax- 
ies. High redshift targets were not included (e.g., those in lLilly et al.ll2003[ ). The procedure 
was repeated with al l the p apers containing XMP galaxies, until no new reference earlier 
than iKunth fc OstlinI (120001 ) was found. In add i tion, our recursive se arching strategy was re- 
peated wit h all the papers citin g iGuseva et al.l (120091 ). For example, llzotov fc ThuanI (120071) 



i s cite d by iGuseva et al.l ( 120091 ) and provides 13 targets from SDSS/DR5. iPapaderos et al. 



( 120081 ) study spectroscopically and mor phologically 7 gal axies identified in SDSS/DR4 and 

Three galaxies are exhaus- 



Jones et al 



200 



6dFGS (six-degree field galaxy survey, 
tively analyzed in three separate papers by iGuseva et al.l ( l2003al jb |d), and are also in - 
cluded in Table |2l Two additional galaxies are from the study by llzotov et al.l (l2006al ). 
Through this step-by-step search procedure, we revised all the papers that seems to be 



releva nt, and a number of them rendered one or several 
ble 121 dPopescu fc Hopplboool: iKniazev et allbood. 



2001 



additional XMP galaxies for Ta- 



Guseva et al. 



2001 



Pilvugin 



2001 



Hidako-Gamez fc Olofssonll2002l: llzotov fc Thuanll2002l: iLee et al.ll2003: IGuseva et al.ll2003c 



I 



Skillman et al.l 120031: IVflchez fc Iglesias-Paramol 120031: iKniazev et al. 



a 



2003; Pustilnik et al. 



200 



Izotov et al. 



Papaderos et al 



2004: Izotov fc Thuan 



2006[lNagao et al 



2004 



Thuan fc Izotov 12005: Kniazev et a 



20061 



van 



Zee fc HavnesI 



20061: van Zee et al 



200' 



. 20051: 



Nava et al 



2006; Lee et al. 20061: Pustilnik et al.l 2006; Guseva et al. 20071: Pustilnik fc Martin 2007 



2008 



Izotov fc Thuan 2007; Izotov et al.l 



Guseva et al. 



2007a 



2009. : Pustilnik et al 



Papaderos et al 



2010 



20081: 



Pustilnik et al.ll2008 



Ekta fc Chengalurll2010bl : ICairos et al 



Brown et al 



boioh . 



Finally, the galaxies in Table HI were inspected individually in to assure that they are 

included in our bibliographic search even when the estimated metallicity exceeds the imposed 
threshold. Table 12 contains all galaxies we found in the literature through this search. It 



■^The digital library NASA Astrophysical Data System http://www.adsabs.haxvard.edu/ 
^NASA Extragalactic Data base http : //nedwww . ipac . caltech . edu/ 
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Table 2 — Continued 



Namc^ 




RA 


DEC 


g 




SpccObjIDb 


12 + log(0/H) 


Commenf^ 


Gr8 


12 


58 


40 


+14 13 03 


17, 


,9 




7.65 


KO. van Zee & Havnes (2006) 


KISSR 1490 


13 


13 


16 


+44 02 30 


19, 


,0 




7.56 


Lee et al. (2004) 


DDO 167 


13 


13 


23 


+46 19 22 








7.20 


Hidalso-Gamez & Olofsson f2002~) 


HS 1319+3224 


13 


21 


20 


+32 08 25 


18, 


,6 




7.59 


Popescu & HoDD C2000~) 


J1323-0132* 


13 


23 


47 


-01 32 52 


18, 


,2 


96204976459612160 


7.78 


Izotov et al. C 2007a) 


J1331+4151* 


13 


31 


27 


+41 51 48 


17, 


1 


412307391565004800 


7.75 


Izotov et al. f2007b) 


ES0577-G27 


13 


42 


47 


-19 34 54 








7.57 


KO. Roennback & Berevall (1995) 


J1355+4651* 


13 


55 


26 


+46 51 51 


19, 


3 


361921789577658368 


7.63 


Pustilnik et al. (2005) 


J1414-0208 


14 


14 


54 


-02 08 23 


18, 


,0 


258056040799535104 


7.28 


PaDaderos et al. ("2008) 


SBS1415+437 


14 


17 


01 


+43 30 05 


17, 


,8 




7.43 


KO. Nava et al. (2006) 


J1422+5145 


14 


22 


51 


+51 45 16 


20, 


,2 




7.41 


Brown et al. (2008) 


J1423+2257* 


14 


23 


43 


+22 57 29 


17, 


,9 


600334402043510784 


7.72 


Izotov et al. (2007b) 


J1441+2914 


14 


41 


58 


+29 14 34 


20, 


1 




7.47 


Brown et al. (2008) 


HS1442+4250 


14 


44 


13 


-1-42 37 44 


15, 


,9 




7.54 


Guseva et al. (2003b) 


J1509+3731* 


15 


09 


34 


+37 31 46 


17, 


,3 


394011865673367552 


7.85 


Izotov et al. (2007b) 


KISSR 666 


15 


15 


42 


+29 01 40 


19, 


1 




7.53 


Naeao et al. (2006) 


KISSR 1013 


16 


16 


39 


+29 03 33 


18, 


,2 


396262437692637184 


7.63 


Nava et al. (2006) 


J1644+2734 


16 


44 


03 


+27 34 05 


17, 


,7 


475922385527111680 


7.48 


Izotov et al. (2007a) 


J1647+2105* 


16 


A7 


1 1 
±1 


+21 05 15 


17, 


,3 


442143986740625408 


7.75 


Izotov et al. (2007b) 


W1702+18 


17 


02 


33 


+18 03 06 


18, 


,4 




7.63 


Griffith et al. (2011) 


rib 17U4+4332 


17 


05 


45 


1 An OO ACt 

+43 28 49 


18, 


4 




7.55 


Pustilnik 6i Martin (2007) 


SagDIG 


19 


29 


59 


-17 40 41 








7.44 


KO, van Zee et al. (2006) 


J2053+0039 


20 


53 


13 


+00 39 15 


19, 


,4 


288175754707992576 


7.33 


Guseva et al. (2009) 


J2104-0035 


21 


04 


55 


-00 35 22 


17, 


,9 


288457262211530752 


7.05 


Guseva et al. (2009) 


J2105+0032 


21 


05 


09 


+00 32 23 


19, 


,0 


288457264749084672 


7.42 


Guseva et al. (2009) 


J2120-0058 


21 


20 


26 


-00 58 27 


18, 


,8 


289300532823064576 


7.65 


Guseva et al. (2009) 


HS2134+0400 


21 


36 


59 


-1-04 14 04 








7.44 


Pustilnik et al. (2006) 


J2150+0033 


21 


50 


32 


+00 33 05 


19, 


,3 


414839880762261504 


7.60 


Guseva et al. (2009) 


ES0146-G14 


22 


13 


00 


-62 04 03 








7.59 


KO, Roennback & Berevall (1995) 


2dF 171716 


22 


13 


26 


-25 26 43 








7.54 


PaDaderos et al. (2006) 


PHL293B 


22 


30 


37 


-00 06 37 


17, 


,2 




7.62 


Guseva et al. (2009) 


2dF 115901 


22 


37 


02 


-28 52 41 








7.57 


PaDaderos et al. (2006) 


J2238+1400* 


22 


38 


31 


+14 00 30 


19, 


,0 


533060792673632256 


7.45 


Guseva et al. (2009) 


J2250+0000 


22 


50 


59 


+00 00 33 


19, 


,8 


190501187865280512 


7.61 


Izotov et al. (2007a) 


J2259+1413 


22 


59 


01 


+14 13 43 


19, 


1 




7.37 


Brown ct al. (2008) 


J2302+0049* 


23 


02 


10 


+00 49 39 


18, 


,8 


190784502518251520 


7.71 


Guseva et al. (2009) 


J2354-0005 


23 


54 


37 


-00 05 02 


18, 


,7 


307596757485748224 


7.35 


Guseva et al. (2009) 



''Those names marked with an asterisk correspond to galaxies also identified in this work (Table [T]|. 
''Unique identifier of the galaxy spectrum in SDSS/DR7. 

' ^Reference to the work where the oxygen abundance was obtained, plus the KO flag when the galaxy belongs to the compilation 
bv lKunth fc OstlinI l l200(i) . 
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includes only 129 galaxies, proving that the XMP targets are really rare objects. The table 
includes name, coordinates, SDSS SpecObjID (when available), the reference for the oxygen 



metallicity, as well as a tag to identify galaxies included in the work by lKunth fc OstlinI (120001 ) 
used as reference. The match between Tables [T] and [2] renders 21 objects, i.e., only 21 of the 
32 candidates had been previously identified as XMP galaxies. They are marked in the tables 
with asterisks. The remaining 11 objects are new. The overlapping between the two tables 
allows us to confirm the low metallicity of the XMP candidates selected using state-of-the-art 
metallicity determinations. The oxygen abundance in Table [2] are based on self-consistently 
determined electron temperatures. Assuming the twenty one known objects to be represen- 
tative of the full set, then mean metallicity turns out to be 12 + log(0/H) = 7.61 ± 0.19, 
with the error bar accounting for the standard deviation. 

Figure M shows the spatial distribution of the low metallicity targets, i.e., the 129 
galaxies obtained from the literature plus the targets found in this work. Note that most 
works have fo c used the search in very localized areas on the sky. For example, the targets by 



Guseva et al.l (120091 ) are clustered around the equator, even though the search is based on 
SDSS/DR6 which has a much broader scope. Our systematic search, however, has selected 
targets spread throughout the SDSS/DR7 field of view, that covers a significant part of the 
sky (~ 20%). 

Figure [7] shows diagnostic plots like Fig. H] but for the galaxies in the literature with 
spectra in SDSS. Assuming this subset to be representative of the full family, it shows several 
systematic differences with respect to the targets we have selected. The main one has to do 
with the ratio between [NII]A6583 and Hct, which is smaller than those characterizing our 
targets (cf. Fig. HJi, and Fig. [711). Because of this difference, many of the XMP found in 
the literature would have escaped our search, which pick out classes with extreme contrast 
between [NII]A6583 and Ha. We do not have a final explanation for the difference, but 
we can offer two conjectures. Our search may yield galaxies in the low metallicity end of 
the XMP family, i.e., even more extreme than those existing in the literature and, conse- 
qu ently, even more metal poor than our estimates based on the semi-empirical calibrations 



by lPettini &: Pagell (120041 ). Alternatively, our search may be missing a fraction of the XMP 
galaxies, where the difference between the [Nil] lines and Ha is not so extreme. The two 
possibilities explain other significant differences between our set and the targets from the 
literature. Our galaxies have larger surface brightness than those of the XMP galaxies in 
the literature (cf. Fig. and Fig. [7^). They have larger Ha equivalent widths as well. The 
surface brightness difference is attributable to the trend for the most metal poor galaxies to 
be BCDs (see § [3]). Alternatively, the physical conditions in the HIT regions of the BCDs 
may differ systematically from other galaxies so that the same low metallicities render a 
particularly small ratio between [NII]A6583 and Ha in BCDs. In order to sort out the two 



45 - 
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—45 I — □ This work 

Brown et aL (2008) 
X Cairos et al. (2010) 
+ Ekta et al. (2010) 
A Guseva et al. (2001) 
+ Guseva et al. (3003b) 
O Guseva et al. (2003c) 
¥ Guseva et al. (2007) 
A Guseva et al. (2009) 
O Hidalgo-Gamez et al. (2002) 
* Izotov et al. (2002) 
+ Izolov el al. (2007b) 
X Izotov & Thuan (2004) 

Izotov et al. (2007a) 
+ Ym, el ql. (:^007), 
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(2003) 
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O Kniazev et al. 

Kniazev et al. 
X Kniazev et al. 
+ Kniazev et al. 

Kniazev et al. 
A Lee et al. (2003) 
A Lee et al. (2004) 
O Lee et al. (2006) 
O Masegosa el al. (1994) 
-/ Nagao et al. (2006) 
A Nava et al. (2006) 

Papaderos et al. (2006) 
Pustilnik et al. (2002) 
i- PuEjtilni^ et|al. (200^) 



+ Papaderos et al. (200H) 
Pilyugin et al. (2001) 
Popescu et al. (2000) 
A Pustilnik et al. (2004) 
X Pustilnik et al. (2006) 
+ Pustilnik et al. (2008) 
O Pustilnik et al. (2010) 
X Ronnback et al. (1995) 
SkiUmann et al. (2003) 
X Thuan & Izotov (2005) 
+ van Zee et al. (2006) 
O van Zee & Haynes (2006) 
X Vilchez et al. (2003) 
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RA [hoL 



Fig. 6. — Spatial distribution of XMP galaxies, both of the 129 found in the literature, plus 
the 32 targets obtained in our systematic search through SDSS/DR7. Twenty one of them 
coincide. Different symbols and colors represent different sources as coded in the inset. In 
particular, the square symbols correspond to the galaxies found in our search. The contours 
represent the sky coverage of the spectroscopic sample of SDSS/DR7. Note that one of 
our targets is outside the main spectroscopic sample of SDSS/DR7 (i.e., it lies outside the 
contours). As usual, RA and DEC stand for right ascension and declination. 
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possibilities one would have to determine the metallicities of ou r targets in a self-consistent 



way na easuring electron temperatures and excitations (as in, e.g.. lStasihskall2004l : llzotov et al. 



2006bl ). Such a detailed analysis clearly goes beyond the scope of the paper, and it is planned 



for following up work. Unfortunately, the twenty one candidates that were already known 
to be XMP do not clarify the situation. Some have particularly low metallicities, but not all 
of them (see the names with asterisks in Table [2]) . 



5. Number density of XMP galaxies in the local universe 

One of the advantages of carrying out a systematic search in SDSS/DR7 is starting 
from a magnitude limited sample. The relatively simple bias produced by this condition can 
be corrected for and, thus, the selected XMP galaxies can be used to estimate the volume 
number density of these unusual objects in the local universe. Such estimate is described in 
the present section. 

Assume the metallicity X of all galaxies in SDSS/DR7 to be known. Then the number 
of galaxies per unit volume and metallicity is just 

n(x) = -Lyln(^^), (5) 

^ ' AX ^ Vi ^ AX ^ ' 

where the sum over i includes all the galaxies in the sample, AX represents the bin size of 
the metallicity histogram, the symbol 11 stands for the rectangle function, 

n(x) = I ^ ^ (6) 
\ elsewhere, 

and Vi represents the maximum volume in which the z-th galaxy of the sample cou ld be 



observed. Equation ([5]) represents the so-called approximation by ISchmidt 



mm used 



to determine luminosity function of galaxies (e.g., Takeuchi et al. 200ol ). which has been 



trans formed to derive the number density of any other physical property of the galaxies 
(e.g., ISanchez Almeida et al.l l2008l § 4). Assume the sample to be magnitude limited, so 
that all galaxies brighter than the apparent magnitude mum are included. (This is the way 
the spectroscopic sample of SDSS was defined and therefore it must be good approximation.) 
Then, 

= -fn, (7) 

with Q the solid angle covered by the survey and di the maximum distance at which the 
i—th galaxy can be observed, 

log(rfi/lMpc) = {mum - M,)/5 - 5, (8) 
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Fig. 7. — Similar to Fig. HI but including previously known XMP galaxies with spectra in 
SDSS/DR7. (a) Color fig — fir vs surface brightness in the g filter fig. Note that these targets 
have systematically lower surface brightness than the ones we find, and many of them cannot 
be considered BCDs according to the criteria used in § [31 (b) Color Mg — Mr vs absolute 
magnitude Mg in the g filter. The error bars have been taken directly from SDSS. 
(c) Ha equivalent width vs absolute magnitudes, (d) BPT diagram showing most of the 
targets to be starforming galaxies. The outlier comes f rom the sample of special targets 
with broad emission lines selected by llzotov et al.l ( l2007al ) , where the integrated spectra are 
known to have a significant SNa and/or AGN contamination. In general, the ratio between 
[NII]A6583 and Ha is not as small as it is in our selection. 
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which only depends on its absolute magnitude Mj. A few comments and caveats are in order 
before applying equation ([5]) to the dataset. First, SDSS is not truly magnitude limited. 
Some bright galaxies a re not observed because of problems to pack the spectrograph fibers 
in crowded fields (see IStoughton et al.l l2002al ) and, as usu al, very low surface brightness 



galaxies tend to be missed even if they are luminous (e.g., iBlanton et al.ll2005l ). None of 
these two problems seems to be of relevance for the XMP galaxies, which are both isolated 
and high surface brightness (§ |3]). SDSS/DR7 sets = 17.8 in r , but our selection 

also imposes a cut at redshift < 0.25 which, in principle, modifies the magnitude-limited 
character of the original sample. This question is of no concern, though. Our targets are 
dwarf galaxies, never reaching a luminosity sufficient to be detected beyond the redshift 
threshold. Finally, computing the galaxy metallicities using detailed chemical abundance 
analyses (as in, e.g.. IStasihskall2004j : llzotov et al.ll2006bl ) goes beyond the scope of the paper . 
We estimate the metallicities using the strong line calibration of by iPettini fc Pagell (|2004[ ) 
described in § |2l They are given in Table [H Specifically, we employ the calibration in 
equation (jl]), which seems to be more appropriate for XMP galaxies where N2~ —2 (N2 is 
the logarithm of ratio between the equivalent widths of the lines, as defined in equation [2]). 
The use of this approximation for the estimate of the metallicity implies that our estimates 
are only indicative. 

The 32 XMP galaxies selected in §|2l together with equations (iD, (E]), ([6]), ©, and (ED, 
render the local density of galaxies with (oxygen) metallicity less than, approximately, one 
tenth of the solar value, 12 + log(0/H) < 7.65, 

n{X)dX ~ (1.32 ± 0.23) ■ lO'^Mpc'l (9) 



The error bar in the previous expression assumes the 32 targets to be drawn from a Poisson 
distribution. In terms of the total number of galaxies, XMP galaxies are just 

0.10 ±0.02%, (10) 

where the total number of galaxies in the local uni verse (^ 0.13 Mpc~'^) has been taken from 
the normalization of the luminosity functions by iBlanton et al.l (|2003[ ). These luminosity 



functions are also based on SDSS, and we use a Hubble constant of 70 kms~^Mpc~^ to 
revert their normalization. The error bar in equation (fTOj) considers the dispersion quoted 
in equation (|9]) together with the spread of values among the different luminosity functions 
corresponding to the five SDSS color filter bandpasses. 

Going a step further, we have considered all the galaxies with emission lines in SDSS/DR7 
to estimate n{X) in a broader range of metallicities. The SDSS/DR7 data reduction pipeline 
provides the equivalent widths of [NII]A6583 and Ha, which we use to estimate oxygen abun- 
dances. The distribution function n{X) inferred from all these galaxies is shown in Fig. [8^, 
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the solid line. It has been computed using equation (|5]) with AX = 0.2, and considering only 
N2 < —0.3 (= 12 + log(0/H) < 8.7), with thi s upper limit forced by the validity range of the 



calibration used to estimate abundances (see iPettini fc Pagelll2004l ). The horizontal bar in 



Fig. corresponds to the density in equation IQ, assuming the 32 XMP galaxies to spread 
one dex in metallicity. It is consistent with this other independent estimate inferred from the 
full distribution of SDSS/DR7 galaxies - if one integrates the histogram n{X) in Fig. [8^ for 
galaxies with metallicities smaller than one-tenth solar (12 + log(0/H) < 7.65), the volume 
number density of XMP galaxies turns out to be, n{X)dX ~ 1.0 ■ 10~ Mpc~ , which 
is close to the figure in equation iQ. The integral of n{X) in Fig. [8^ for all metallicities 
turns out to be ~ 5.2 ■ 10~^Mpc~^, which i s a factor two s maller than, but consistent with. 



the number density of local galaxies from iBlanton et al.l ( 120031 ) used in equation f lTOj) for 



normalization. One have to keep in mind that the distribution in Fig. |8t do not consider 
neither galaxies without emission lines nor galaxies with super-solar metallicities, which all 
together can easily account for the factor two difference. 

XMP galaxies are dwarf and therefore they tend to be underrepresented in surveys. 
In order to illustrate the importance of such Malmquist bias. Fig. [Hb contains the actual 
histogram of observed galaxies N{X) from which n{X) was derived. In the parlance of 
equation (|5]), it is given by 

ivW = En(^^)- (11) 

i 

Note how the low metallicity tail of N{X) is depressed with respect to n{X). This difference 
can be better appreciated in Fig. which also shows a scaled version of N{X) forced to 
agree with n{X) at the bin of highest metallicity. The figure shows how XMP galaxies 
are clearly lacking in the observation (the dashed line) as compared to their actual number 
densities (the solid line). We find than only 0.01% of the observed galaxies with emission 
lines are XMP, whereas they represent 0.2% considering galaxies in a fixed volume. 

Obviously, the above estimates assume the XMP galaxy set to be complete. However, 
we cannot discard the existence of missing targets (§|1]) and, if so, the figures we provide have 
to be scaled up accordingly. Although it is difficult to tell how much, we will try to make an 
educated guess. About sixty known XMP galaxies in the SDSS field of view have not been 
included in our selection (see Fig. E]). They may have been excluded for several reasons - 
because they are too faint, because noise in the spectra artificially enhances [NII]A6583 with 
respect to Ha, because [NII]A6583 exceeds the limits of our selection, and possibly others. 
Since some of them are proper reasons for exclusion, the number of sixty undetected galaxies 
may be taken as an upper limit for those missing. Consequently, the number densities we 
estimate may need to be increased, but only up to a factor of three. 
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N{X) scaled 

Our targets 



12 + log(0/H) 



N(X) 
■ Our targets 



Fig. 8. — (a) Volume number density of galaxies with a given metallicity as inferred from 
the SDSS/DR7 galaxies with emission lines (n[X], the solid line). The bar represents the 
average value inferred from the 32 XMP galaxies identified in this study, (b) Histogram of 
galaxies with a given metallicity as directly observed in SDSS/DR7 (the dashed line). A 
scaled up version of this histogram is also shown as the dashed line in (a). Note how the 
already scarce XMP galaxies, represented by the solid line, are hindered even further from 
observation because they are dwarf, and so, only observable in our neighborhood. 
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6. Conclusions 

We have carried out a systematic search for extremely metal poor (XMP) galaxies among 
the spectroscopic sample of SDSS/DR7. These objects are rare, and have a clear cosmological 
interest as unevolved galaxies probably tracing physical conditions in an early phase of the 
universe (see §[T]). The search is based on the classification of a narrow spectral region around 
Ha, known to be particularly sensitive to the (HII gas) metallicity. Obviously, the almost 
one million spectra in the database cannot be inspected individually, and we resort to the 
use of a standard automatic method of classification: k-means. After two nested runs of the 
procedure, and a subsequent cleaning up for artifacts created by the SDSS pipeline, we end 
up with 32 targets (§ IH Tabled]). They represent only 0.01% of the observed galaxies with 
emission lines. The final metallicity est imate remains pending, however, strong-line empirical 



calibrations by iPettini &: Pagell (|2004( ) imply their oxygen metallicity to be of the order of 
one-tenth solar. Obviously, our candidates must be studied in detail through imaging and 
spectroscopy in following up work. 

In order to put the work into context, we carried out a bibliographic search for galaxies 
with metallicity smaller than one-tenth the solar value (§ H]). We find only 129 (Table |2]), 
and only 21 of them overlap with our sample which, consequently, provides 11 new XMP 
candidates. The oxygen metallicity of the 21 known targets turns out to be 12 + log(0/H) ~ 
7.61 ±0.19. These metallicities are based on electron temperatures and, thus, they are more 
precise than the empirical calibrations we have been employing. Assuming this subset to be 
representative of the full sample, it confirms the XMP character of our targets. 

Our procedure is systematic, therefore, in principle, it should have identified all the 
spectra in SDSS/DR7 where Ha ^ [NII]A6583. This assumption, together with the fact 
that the SDSS spectroscopic sample is limited in apparent magnitude, allows us to estimate 
the volume number density of XMP galaxies. Using the Knax approximation, we estimate it 
to be (1.32 ± 0.23) ■ 10~^ Mpc~'^. So far as we are aware of, it provides the first estimate of 
this number density. The XMP galaxies represent 0.1% of the galaxies in the local volume, 
or ~ 0.2% considering only emission line galaxies. 

We analyze some of the physical properties of the candidates in § [31 All but four of our 
XMP candidates turn out to be blue compact dwarfs (BCDs). Note that this association is by 
no means trivial. We have selected our sample according to the shape of their spectra in an 
narrow spectral window around Ha, and this narrow bit of spectrum turns out to determine 
many global properties of the galaxy such as color, compactness, and star formation rate. 
We ignore what causes the association betw een XMPs and BCDs. T he fact that most metal 



poor galaxies tend to be BCDs is known. iKunth fc OstlinI (120001 ) point it out, but warn 



against a trivial interpretation since it may reflect an observational bias that makes it easier 
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to detect high surface brightness objects such as BCDs. Actually, there are XMP galaxies 
whose surface brightness does not suffice to call them BCDs (see § H]). However, the fact 
that our XMP candidates are also BCDs is revealing. The s urface brightness wh ere SDSS 
starts having problems of completeness is about fig > 23.5 (IBlanton et al.ll2005l ). i.e., one 
magnitude fainter than the faintest XMP candidate. Consequently, if we do not find low 
surface brightness galaxies with [NII]A6583 ^ Ha is because they do not exist in SDSS/DR7. 

Among the 32 XMP candidates, 24 of t hem have either cometary shape or are formed 
by chains of knots. iPapaderos et al.l ( l2008l ) already noticed the trend for XMP BCDs to 
reveal a cometary morphology due to the presence of intense star formation at one edge 
that gradually de creases. This shape is not unique to XMP galaxies, but it comprises only 
10% of all BCDs (ILoose fc Thuanlll985l ). whereas it is dominant when they are metal poor 
( IPapaderos et al.ll2008l ). The origin of the XMP shapes is also unclear. There are arguments 
for gravitational triggering due to mergers with low-mass stellar or gaseous companions, 
or for the self propaga tion of star formation activity within a pre- existing gas rich g alaxy 
(IPapaderos et al.ll2008l ). Recently Iskta fc Chengaluil fcoiOal l and Iskta et al.l toQ^ have 
found that all XMP galaxies have distorted HI morphologies, which may indica te the infall of 



exter nal unenriched gas feeding the starburst and dropping the metallicity (e.g.. lKewley et al. 



20061 ). It may also be the signature of gas stripping forced by the i nteraction with an external 



medium (e.g., iGavazzi et al.ll200ll : lElme green fc Elme green! |2010| ) 



A concluding remark is in order. The methods for mining massive astronomical databases 
are still under development. The bases are simply too large for the traditional techniques 
to be efficient. In this sense, the current paper presents a new approach that may be of 
interest beyond our particular application. The standard method to find XMP galaxies (or 
galaxies with any other property) would have been to set, beforehand, the observational 
criteria the targets should fulfill. Then those targets complying with the criteria would have 
been selected. Unfortunately, the criteria are often crude and have large uncertainties, which 
propagate into the selection as false objects sneaking in or true objects leaking out. Here 
the criteria have not been stipulated in advance. We classify the whole database, and only 
afterward we select those galaxies belonging the classes that have the intended property. 
The search is comprehensive, and the results are robust against uncertainties in the selection 
criteria. 
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